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The  main  disadvantages  of  independent  checkpointing  in  message-passing  systems  are  the 
possible  domino  effect  and  the  associated  storage  space  overhead  for  maintaining  multiple 
checkpoints.  In  most  previous  research  on  checkpointing  and  recovery,  it  has  been  assumed 
that  only  the  checkpoints  older  than  the  global  recovery  line  can  be  discarded.  In  this  paper, 
we  generalize  the  notion  of  a  recovery  line  to  that  of  a  potential  recovery  line.  Only  the 
checkpoints  belonging  to  at  least  one  of  the  potential  recovery  lines  can  not  be  discarded. 
By  using  the  model  of  maximum-sized  antichains  on  a  partially  ordered  set.  an  efficient 
algorithm  is  developed  for  finding  all  non-discarclable  checkpoints  and  an  upper  bound  on  the 
number  of  non-discardable  checkpoints  is  derived.  Communication  trace-driven  simulation 


for  several  parallel  programs  is  used  to  show  the  benefits  of  the  proposed  algorithm  for  real 


applications. 
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Numbered  Footnotes: 

1.  Our  goal  is  similar  to  the  idea  of  discarding  useless  recovery  points  at  earlier  possible 
time  as  described  in  the  papers  by  Kim  et.  al  [1,2]. 

2.  Extension  of  our  work  to  systems  with  optimistic  logging  protocols  is  considered  else¬ 
where  [3]. 

3.  the  comparability  through  the  “happens  before”  relation. 

4.  The  subscript  “r”  stands  for  real  checkpoints  and  “v”  for  virtual  checkpoints. 


CAPTIONS  FOR  FIGURES  AND  TABLES 


Figure  1:  Checkpoint  consistency,  (a)  message  received  but  not  yet  sent;  (b)  message 
sent  but  not  yet  received. 

Figure  2:  Example  checkpoint  graph,  (a)  the  checkpoint  and  communication  pattern; 
(b)  the  corresponding  checkpoint  graph. 

Figure  3:  The  rollback  propagation  algorithm. 

Figure  4:  (a)  The  extended  checkpoint  graph  when  po  initiates  the  recovery;  (b)  C P22 
and  all  virtual  checkpoints  are  deleted  from  the  checkpoint  graph  after  the  recovery. 

Figure  5:  Construction  of  the  potential  supergraph  G  by  adding  n;’s  to  G. 

Figure  6:  Transforming  the  potential  recovery  line  by  replacing  A/2  with  B2. 

Figure  7:  Transforming  the  potential  recovery  line  by  replacing  A/3  with  A/4. 

Figure  8:  The  Predictive  Checkpoint  Space  Reclamation  Algorithm. 

Figure  9:  Execution  of  the  PCSR  algorithm  (a)  G  —  no  (b)  G  —  (c)  G  —  n2  (d)  G  —  n 3. 

(Shaded  checkpoints  belong  to  the  recovery  lines  and  checkpoints  marked  “X"  are  discard¬ 
able.) 

Figure  10-  G\,\  The  checkpoint  graph  with  N(N  +  l)/2  non-discardable  checkpoints. 

Figure  11:  Non-obsolete  checkpoints  and  noil-discardable  checkpoints  for  the  Cell  place¬ 


ment  program. 


Figure  12:  Non-obsolete  checkpoints  and  non-discardable  checkpoints  for  the  Channel 


router  program. 

Figure  13:  Non-obsolete  checkpoints  and  non-discardable  checkpoints  for  the  Knight  tour 
program. 

Figure  14:  Non-obsolete  checkpoints  and  non-discardable  checkpoints  for  the  N-queen 
program. 

Table  1:  Execution  and  checkpoint  parameters  of  the  programs. 


I  INTRODUCTION 


Numerous  checkpointing  and  rollback  recovery  techniques  have  been  proposed  in  the 
literature  for  message- passing  systems.  They  can  be  classified  into  two  basic  categories. 
Coordinated  checkpointing  schemes  synchronize  computation  with  checkpointing  by  coordi¬ 
nating  processors  during  a  checkpointing  session  in  order  to  maintain  a  consistent  set  of 
checkpoints  [4-6].  Each  processor  only  keeps  the  most  recent  successful  checkpoint  and  roll¬ 
back  propagation  is  avoided  at  the  cost  of  potentially  significant  performance  degradation 
during  normal  execution.  Independent  checkpointing  schemes  replace  the  above  synchro¬ 
nization  by  dependency  tracking  and  possibly  message  logging  [7-10]  in  order  to  preserve 
process  autonomy.  Possible  rollback  propagation  in  case  of  a  fault  is  handled  by  searching 
for  a  consistent  system  state  based  on  the  dependency  information.  Lower  run-time  over¬ 
head  during  normal  execution  is  achieved  by  maintaining  multiple  checkpoints  and  allowing 
slower  recovery. 

This  paper  considers  the  independent  checkpointing  schemes  for  possibly  nondetermin- 
istic  execution.  Most  research  on  this  subject  has  concentrated  on  algorithms  for  finding 
the  latest  consistent  set  of  checkpoints,  i.e.,  the  recovery  line ,  during  rollback  recovery.  The 
same  algorithms  can  be  applied  to  the  set  of  existing  checkpoints  during  normal  execution 
to  find  the  global  recovery  line.  All  the  checkpoints  older  than  the  global  recovery  line  then 
become  obsolete  checkpoints  and  can  therefore  be  discarded.  When  the  domino  effect  [11] 
occurs,  a  potentially  large  number  of  non-obsolete  checkpoints  have  to  be  kept  on  the  stable 
storage  and  result  in  large  space  overhead. 

Our  approach  is  based  on  the  observation  that  many  non-obsolete  checkpoints  can  also 
be  discarded  because  they  will  never  become  members  of  any  future  recovery  line1.  The 
notion  of  a  recovery  line  is  generalized  to  that  of  a  potential  recovery  line.  A  checkpoint  is 
non-discardable  if  and  only  if  it  belongs  to  at  least  one  of  the  potential  recovery  lines.  By 
modeling  a  recovery  line  as  the  maximal  maximum-sized  antichain  on  a  partially  ordered 

'Our  goal  is  similar  to  the  idea  of  discarding  useless  recovery  points  at  the  earlier  possible  time  described 
in  the  papers  by  Kim  et.  al  [1,2]. 
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set,  an  efficient  algorithm  is  developed  for  finding  the  set  of  non-discardable  checkpoints. 
An  upper  bound  on  the  size  of  this  set  given  the  number  of  processors  is  also  derived  to 
show  that  even  when  domino  effect  persists  during  program  execution,  the  space  overhead 
for  maintaining  multiple  checkpoints  will  not  grow  without  limit. 

The  outline  of  the  paper  is  as  follows.  Section  II  describes  the  system  model  and  the 
checkpointing  and  recovery  protocol;  Section  III  gives  our  mathematical  model  of  recov¬ 
ery  lines;  Section  IV  presents  the  necessary  and  sufficient  conditions  for  a  checkpoint  to  be 
non-discardable;  the  checkpoint  space  reclamation  algorithm  is  developed  in  Section  V;  the 
maximum  number  of  non-discardable  checkpoints  is  derived  in  Section  VI  and  the  experi¬ 
mental  evaluation  is  described  in  Section  VII. 


II  SYSTEM  MODEL  AND  CHECKPOINT 

CONSISTENCY 

A.  Checkpointing  and  Rollback  Recovery  Protocol 

The  system  considered  in  this  paper  consists  of  a  number  of  concurrent  processes  for  which 
all  process  communication  is  through  message  passing.  Processes  are  assumed  to  run  on 
fail-stop  processors  [12]  and  each  processor  is  considered  as  an  individual  recovery  unit  [S]. 
Since  studies  [13]  have  shown  that  the  support  for  nondeterministic  processes  is  important  for 
practical  applications,  we  do  not  assume  deterministic  execution.  Consequently,  if  the  sender 
of  a  message  is  rolled  back,  the  corresponding  message  will  be  invalid  during  reexecution, 
which  means  that  the  receiver  also  has  to  be  rolled  back.  We  do  not  address  the  problem  of 
concurrent  rollbacks  due  to  multiple  failures  [9]. 

During  normal  execution,  the  state  of  each  processor  is  occasionally  saved  as  a  checkpoint 
on  stable  storage  and  can  be  reloaded  for  rollback  recovery  in  case  of  a  detected  error.  Let 
C P,k  denote  the  kth  checkpoint  of  processor  p,  with  k  >  0  and  0  <  t  <  N  —  1.  where  .V 
is  the  number  of  processors.  A  checkpoint,  inttrval  is  defined  to  be  the  time  between  two 


consecutive  checkpoints  on  the  same  processor  and  the  interval  between  C Pa  and  C%+i) 
is  called  the  kth.  checkpoint  interval.  Each  message  is  tagged  with  the  current  checkpoint 
interval  number  and  the  processor  number  of  the  sender.  Each  processor  takes  its  check¬ 
point  independently,  i.e.,  without  synchronizing  with  any  other  processors.  Each  checkpoint 
includes  communication  information  (or  input  information  [9])  containing  pairs  of  the  pro¬ 
cessor  number  and  checkpoint  interval  number,  (j,m),  if  at  least  one  message  from  the  mth 
checkpoint  interval  of  processor  p3  has  been  received  during  the  previous  checkpoint  interval. 

A  centralized  checkpoint  space  reclamation  algorithm  can  be  invoked  by  any  processor 
periodically  to  reduce  the  space  overhead  by  removing  discardable  checkpoints.  First,  the 
communication  information  of  all  the  existing  checkpoints  is  collected  to  construct  the  check¬ 
point  graph.  The  rollback  propagation  algorithm  (described  later)  is  executed  on  the  check¬ 
point  graph  to  determine  the  global  recovery  line  [7].  All  the  checkpoints  taken  before  the 
global  recovery  line  then  become  obsolete  and  their  space  can  therefore  be  reclaimed. 

When  processor  p,  detects  an  error,  it  starts  a  two-phase  centralized  recovery  procedure 
[9].  First,  a  rollback-initiating  message  is  sent  to  every  other  processor  to  request  the  up-to- 
date  communication  information.  Each  surviving  processor  takes  a  virtual  checkpoint  upon 
receiving  the  rollback-initiating  message  so  that  the  communication  information  during  the 
most  recent  checkpoint  interval  is  also  collected.  After  receiving  the  responses,  p,  constructs 
the  extended  checkpoint  graph  [7]  and  executes  the  rollback  propagation  algorithm  to  deter¬ 
mine  the  local  recovery  line.  A  rollback-request  message  containing  the  local  recovery  line  is 
then  sent  to  each  processor  and  requests  the  involved  processors  to  rollback  and  restart. 

B.  Checkpoint  Consistency 

There  are  two  situations  concerning  the  consistency  between  two  checkpoints.  In  Fig.  1(a). 
if  processors  pt  and  p}  restart  from  the  checkpoints  C  Pa  and  C Pjm  respectively,  the  message 
m  is  recorded  as  “received  but  not  yet  sent"’.  In  a  general  model  without  the  assumption  of 
deterministic  execution,  message  m  becomes  an  orphan  message  [1-1].  C Pa-  and  C Pjm  are 
thus  inconsistent. 
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(a) 


(b) 


Figure  1:  Checkpoint  consistency,  (a)  message  received  but  not  yet  sent;  (b)  message  sent 
but  not  yet  received. 

Fig.  1(b)  illustrates  the  second  situation.  The  message  m  is  recorded  as  ‘‘sent  but  not  yet 
received”  according  to  the  system  state  containing  C  P,k  and  CPjm.  By  defining  the  state  of 
the  channels  to  be  the  set  of  messages  sent  but  not  yet  received,  it  has  been  shown  [5.  15] 
that  checkpoints  like  C P,k  and  C Pjm  can  be  considered  consistent  if  the  corresponding  state 
of  the  channels  is  also  recorded.  In  Koo  and  Toueg’s  paper  [6],  such  a  state  was  assumed 
to  be  recorded  at  the  sender  side  in  the  form  of  lost  messages  and  the  set  of  messages  was 
guaranteed  to  be  re-delivered  reliably  by  some  end-to-end  transmission  protocol.  Another 
way  of  recording  the  channel  state  is  through  message  logging.  Pessimistic  logging  protocols 
[  16—  IS]  can  ensure  such  a  state  is  properly  recorded  at  the  receiving  end2.  As  a  result,  we 
consider  the  situation  in  Fig.  1(b)  as  consistent. 


Ill  RECOVERY  LINES 


A.  Partially  Ordered  Sets  and  Checkpoint  Graphs 

In  a  message-passing  system,  an  event  a  happens  before  event  b  [19]  if  and  only  if 
l.  a  and  b  are  events  on  the  same  processor,  and  a  occurs  before  b\  or 
2 Extension  of  our  work  to  systems  with  optimistic  logging  protocols  is  considered  elsewhere  [3]. 
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2.  a  is  the  sending  of  a  message  by  one  processor  and  6  is  the  receiving  of  the  same 
message  by  another  processor;  or 

3.  a  happens  before  c  and  c  happens  before  b. 

The  set  of  events  with  the  “happens  before”  relation  forms  a  partially  ordered  set ,  or  poset 
[19].  When  dealing  with  the  problem  of  finding  a  consistent  set  of  checkpoints,  we  only 
consider  the  induced  subposet  [20]  P  =  (C,  <),  v'here  C  is  the  set  of  all  checkpoints  and  “<” 
is  the  “happens  before”  relation. 

A  checkpoint  graph  (CPG),  of  which  the  transitive  closure  is  the  poset  P.  is  a  directed 
acyclic  graph  constructed  as  follows  [7].  Each  vertex  on  the  checkpoint  graph  represents  a 
checkpoint.  A  directed  edge  exists  from  verte-  C Pik  to  vertex  CPjm  if  j  =  i  and  m  =  k  +  1, 
or  j  i  and  there  exists  a  message  sent  from  the  Arth  checkpoint  interval  of  processor  p,  and 
received  by  processor  Pj  at  the  (m  —  l)th  checkpoint  interval.  Fig.  2  gives  an  example  of 
CPG  with  its  corresponding  communication  pattern. 


(a) 


(b) 


Figure  2:  Example  checkpoint  graph,  (a)  the  checkpoint  and  communication  pattern:  (b) 
the  corresponding  checkpoint  graph. 


Most  of  the  ideas  in  this  paper  will  be  illustrated  by  the  CPG  instead  of  the  more  abstract 
poset.  An  element  a  in  a  poset  is  maximal  (minimal)  if  there  does  not  exist  any  element  b 
such  that  a  <  b  (b  <  a);  correspondingly,  a  vertex  in  a  CPG  will  be  referred  to  as  maximal 
(minimal)  if  it  has  no  outgoing  (incoming)  edge.  Also,  the  following  terminology  will  be  used 
interchangeably:  a  <  6,  a  is  “smaller  than"  b,  b  is  “greater  than”  a,  a  can  “strictly  reach”  b 
and  b  is  “strictly  reachable  from"  a. 

B.  Maximum-Sized  Antichains  and  Recovery  Lines 

A  partial  ordering  of  a  set  5  is  linear  if  for  every  two  elements  a  and  b  in  S,  either  a  <  b  or 
b  <  a  [20].  In  a  poset,  a  subset  whose  elements  are  linearly  ordered  is  called  a  chain  and  a 
set  of  elements,  no  two  of  which  are  comparable,  is  called  an  antichain.  In  particular,  a  set 
of  any  number  of  maximal  (minimal)  elements  clearly  forms  an  antichain.  The  antichains 
with  the  largest  number  of  elements  are  called  the  maximum-sized  antichains  or  M-chains 
for  short.  Let  A{Q)  denote  the  set  of  antichains  on  a  poset  Q  and.  for  .4,  B  E  A{Q).  define 
.4  ■<  B  if  and  only  if  for  all  a  E  A  there  exists  6  E  B  such  that  a  <  b.  [21].  Also  let  M(Q) 
denote  the  set  of  maximum-sized  antichains.  We  then  have  the  following  properties  [21.22]. 

LEMMA  1 

(a)  (A(Q),  -<)  forms  a  poset; 

(b)  (A{Q).  di)  IS  a  lattice  and  its  subposet  (,\4(Q).  is  a  sublatt  ce; 

(c)  For  .\f\ .  Mi  E  M(Q),  the  join  (least  upper  bound)  M\  V  .V/2  =  max(M\  U  A/-2)  and  the 
meet  (greatest  lower  bound)  A/j  \  .V/2  =  min(M\  U  A/2),  where  max(S)  denote  the  set  of  all 
maximal  elements  in  S  and  min(S)  is  similarly  defined. 

Since  (yVf(Q),  ^)  is  a  finite  lattice,  there  must  exist  a  unique  maximum  member  M‘(Q) 
[21],  called  the  maximal  maximum-sized  antichain  [23]  or  MM-chain ,  such  that  M  -<  \I'(Q) 
for  every  M  E  M(Q). 

LEMMA  2  For  any  A/  E  M{Q),  there  must  not  exist  any  a  E  M*(Q)  such  that  a  <  b  for 
some  b  E  A/. 
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Proof.  Suppose  there  exist  such  a  £  M~(Q)  and  b  £  M.  M  -<  Mm(Q)  implies  there  exists 
c  E  Mm(Q)  such  that  b  <  c.  Together  with  a  <  b,  this  leads  to  a  <  c,  contradicting  the  fact 
that  Al’(Q)  is  an  antichain.  □ 

In  this  paper,  we  define  a  global  checkpoint  as  a  set  of  N  checkpoints,  one  from  each 
processor.  Based  on  the  description  of  consistency  in  the  previous  section,  a  consistent 
global  checkpoint  is  a  set  of  checkpoints,  one  from  each  processor  and  no  two  of  which  are 
comparable  through  the  “happens  before”  relation.  A  recovery  line  refers  to  the  “most 
recent”  consistent  global  checkpoint.  Since  one  special  feature  of  the  poset  P  =  (C.  <) 
is  that  there  always  exists  a  natural  chain  decomposition  [21]  {Co,  Cj, ...,  C,v-i}  where  C; 
consists  of  all  checkpoints  of  processor  p,,  the  size  d(P)  of  the  M-chains  cannot  be  greater 
than  N .  Furthermore,  because  the  first  checkpoint  of  every  processor  must  be  minimal  and 
the  set  of  such  checkpoints  always  forms  an  antichain  of  size  N,  d(P)  is  in  fact  equal  to 
.V  and  each  M-chain  will  consist  of  N  elements,  one  from  each  C,.  It  becomes  clear  that 
each  M-chain  is  equivalent  to  a  consistent  global  checkpoint.  Since  it  is  always  desirable  to 
rollback  to  the  most  recent  consistent  global  checkpoint  in  order  to  minimize  the  recovery 
cost.  Lemma  l  guarantees  the  existence  and  uniqueness  of  such  a  recovery  line.  i.e..  the 
MM-chain. 

C.  Ideals,  Filters  and  the  Reachable  Sets 

Given  a  poset  P ,  if  J  is  a  set  of  elements  of  P  with  the  property 

a  £  I  and  b  <  a  =>  b  £  J, 

I  is  called  an  ideal  or  a  down-set  of  P.  Similarly,  a  filter  or  an  up-set ,  T,  of  P  is  a  set  of 
elements  such  that  if  a  £  T  and  a  <  b,  then  b  £  T . 

For  an  antichain  .1  in  P.  define 

1(A)  =  {x  £  P  :  x  <  a  for  some  a  £  A} 


lF(A)  =  {x  £  P  :  a  <  .v  for  some  a  £  /!}. 


Then  1(A)  is  an  ideal  [21]  and  T(A)  is  a  filter. 


LEMMA  3  A  and  B  art  antichains,  then  [21] 

(a)  1(A)  C  1(B)  <?=>  A  X  B; 

(b)  T(A)  C  JT(5)  *=>  B^A. 

In  terms  of  the  CPG,  the  set  of  vertices  which  can  reach  any  vertex  in  an  antichain  A  is 
equal  to  1(A)  and  the  set  of  vertices  reachable  from  any  vertex  in  A  is  equal  to  J-(A). 

D.  The  Rollback  Propagation  Algorithm 

The  algorithm  for  finding  the  recovery  line  will  form  the  basis  of  our  checkpoint  space 
reclamation  algorithm.  The  problem  of  finding  the  MM-chain  in  a  general  poset  can  be 
transformed  into  that  of  finding  a  maximum  matching  on  a  bipartite  graph  [23].  For  the 
poset  P  —  (C,  <)  in  our  problem,  a  simpler  rollback  propagation  algorithm ,  shown  in  Fig.  3, 
has  been  proposed  [7]  and  applied  to  the  CPG.  The  complexity  of  the  algorithm  is  linear  in 
the  number  of  edges  because  each  edge  can  be  removed  after  it  is  used  to  reach  some  vertex 
and  therefore  visited  at  most  once. 


/*  CP  stands  for  checkpoint  */ 

/*  Initially,  all  the  CPs  are  unmarked  */ 

include  the  latest  CP  of  each  processor  in  the  root  set ; 
mark  all  CPs  strictly  reachable  from  any  CP  in  the  root  set; 
while  (at  least  one  CP  in  the  root  set  is  marked)  { 

replace  each  marked  CP  in  the  root  set  by  the  latest  unmarked  CP  on  the 
same  processor; 

mark  all  CPs  strictly  reachable  from  any  CP  in  the  root  set; 

} 

the  root  set  is  the  recovery  line. 


Figure  3:  The  rollback  propagation  algorithm. 
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Having  introduced  the  checkpoint  graphs  and  the  rollback  propagation  algorithm,  we  now 
give  an  example  illustrating  the  checkpoint  space  reclamation  algorithm  and  the  recovery 
protocol.  The  global  recovery  line  for  the  checkpoint  graph  in  Fig.  2(b)  is  indicated  by  the 
shaded  vertices.  The  four  checkpoints  before  the  global  recovery  line  are  obsolete.  Suppose 
the  extended  checkpoint  graph  when  po  initiates  the  recovery  is  as  shown  in  Fig.  4(a). 
The  checkpoint  graph  after  recovery  (Fig.  4(b))  is  then  obtained  by  deleting  the  virtual 
checkpoints  and  all  the  checkpoints  after  the  local  recovery  line. 


Local  Recovery  Line 

1 


Figure  4:  (a)  The  extended  checkpoint  graph  when  po  initiates  the  recovery;  (b)  C P22  and 
all  virtual  checkpoints  are  deleted  from  the  checkpoint  graph  after  the  recovery. 


IV  POTENTIAL  RECOVERY  LINES 


Let  Qj{G)  denote  the  set  of  all  future  graphs  of  a  checkpoint  graph  G  which  can  possibly 
evolve  from  G  during  program  execution  in  the  future.  We  define  a  potential  recovery  line 
of  G  as  the  recovery  line  of  some  checkpoint  graph  G'  where  G'  €  Gj{G).  Since  the  purpose 
of  keeping  checkpoints  is  for  possible  future  recovery,  a  checkpoint  is  discardable  if  and  only 
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if  it  does  not  belong  to  any  potential  recovery  line.  Being  obsolete  is  simply  a  sufficient 
condition  for  being  discardable  but  not  a  necessary  condition.  We  will  show  that  there  exist 
discardable  non-obsolete  checkpoints. 

Although  the  execution  time  for  a  typical  program  is  finite,  the  number  of  future  graphs 
of  any  given  checkpoint  graph  is  enormous  because  the  communication  pattern  and  the 
error  occurrence  are  in  general  unpredictable.  By  characterizing  the  possible  evolution  of 
a  checkpoint  graph,  we  are  able  to  reduce  the  almost  infinite  number  of  situations  to  a 
manageable  number  of  finite  cases  for  the  problem  of  identifying  the  minimum  number  of 
non-discardable  checkpoints. 

A.  Adjoining  New  Vertices  During  Normal  Execution 

During  normal  execution,  the  size  of  the  checkpoint  graph  increases  as  new  checkpoints  are 
taken.  Because  checkpoint  graphs  represent  program  dependency,  the  following  rules  must 
be  satisfied  when  adding  new  vertices.  For  each  new  vertex  C  P,k  with  k  >  1, 

Rule  1:  CP,k  must  have  an  incoming  edge  from  C  Pi(k- 1>; 

Rule  2:  CPtk  can  have  incoming  edges  from  an  arbitrary  number  of  existing  vertices.  But 
it  can  not  have  any  outgoing  edge  to  any  existing  vertex. 

Note  that  a  checkpoint  C  P,k  that  “happens  before”  CPjm  may  not  be  collected  before 
C Pjm  because  of  the  unpredictable  message  transmission  delay  during  the  collection  process. 
However,  such  a  situation  can  be  detected  by  the  communication  information.  If  a  vertex 
C Pjm  is  expecting  an  incoming  edge  from  a  non-existing  vertex  C P,k ,  C Pjm  and  its  associated 
incoming  edges  will  be  excluded  from  the  existing  CPG.  By  adding  each  new  vertex  under 
this  constraint,  none  of  the  new  vertices  can  have  edges  pointing  to  any  existing  vertex  and, 
therefore,  Rule  2  is  enforced.  The  following  important  property  is  ensured  by  Rule  2. 

PROPERTY  1  Adding  a  new  vertex  v  and  its  associated  incoming  edges  to  an  existing 
CPG  can  not  change  the  relation 3  between  any  pair  of  existing  vertices. 

3the  comparability  through  the  “happens  before"  relation. 
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Proof.  The  relation  between  any  pair  of  existing  vertices  will  be  changed  only  if  one 
vertex  is  smaller  than  v  and  the  other  one  is  greater  than  v.  However,  Rule  2  guarantees 
none  of  the  existing  vertices  is  greater  than  v.  Therefore,  the  property  holds.  □ 

Let  Qa(G)  denote  the  set  of  all  potential  supergraphs  obtainable  by  adjoining  new  vertices 
to  a  given  checkpoint  graph  G  according  to  the  above  rules.  Lemma  4  gives  the  relationship 
between  the  antichains  of  G  and  those  of  its  potential  supergraphs. 

LEMMA  4  Given  checkpoint  graphs  G  =  (V,  E)  and  G'  E  Qa{G), 

(a)  A{G)  C  A(G'); 

(b)  A  E  A(G')  and  ACV  =>  A  E  A(G); 

(c)  M{G)  C  M(G’); 

(d)  M  E  M{G')  and  M  c  V  =*>  M  E  M(G); 

(e)  M‘(G)  d  M‘{G'). 

Proof,  (a)  and  (b)  follow  immediately  from  Property  1.  By  Rule  1  and  the  discussion 
after  Lemma  2,  the  size  of  the  maximum-sized  antichains  is  always  fixed  and  equal  to  the 
number  of  processors,  thus  (c)  and  (d)  holds.  In  particular,  i\Im(G)  E  M(G)  C  j\4{G') 
implies  M*(G )  X  M’(G').  □ 

One  special  potential  supergraph  of  G,  G,  will  play  a  very  important  role  throughout 
this  paper  and  is  constructed  as  follows: 

1.  adjoin  N  new  vertices  no,  nj,  ...  ,  n.\r-i  to  G; 

2.  an  edge  is  added  from  the  last  vertex  e,  on  each  chain  C,  to  n,  as  shown  in  Fig.  5. 

Let  U  denote  the  set  of  all  such  n,’s.  We  now  prove  the  following  theorem  which  relates  the 
MM-chain  of  any  potential  supergraph  to  the  MM-chain  of  one  of  the  subgraphs  of  G. 

LEMMA  5  Given  a  checkpoint  graph  G  =  [V,E)  and  r  6  V,  if  v  6  M*{G')  for  some 

G'  =  (V\  E’)  E  gs(G),  then  v  E  M*(G  -  W)  for  some  W  C  U. 
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Figure  5:  Construction  of  the  potential  supergraph  G  by  adding  n,-’s  to  G. 

Proof.  If  v  e  Al’(G')  for  some  G'  G  QS(G ),  let  M*(G')  =  Ml  U  AI2  such  that  Ah  = 
AIm(G')  D  V  and  M2  =  AI'(G')  \  Ah  as  shown  in  Fig.  6(a).  Clearly,  v  G  Ah-  Define  p(u)  =  i 
if  u  represents  a  checkpoint  of  processor  p,  and  partition  the  set  U  as  U  =  Bx  U  B?  where 

B\  =  {np(u)  :  u  G  Mi},  B2  =  {np(u)  :  «  €  AI2}. 

We  want  to  show  that  AI\  U  B2  =  M*(G  —  B\)  (Fig.  6(b)). 

First  we  prove  AI\  U  B2  G  M.{G  —  Bi).  Consider  the  graph  G' .  For  every  u  G  A /2, 
ep(u)  <  u  by  Rule  1.  According  to  Lemma  3,  T(ep(u))  C  J (u).  Since  u  and  all  the  vertices 
in  Ah  belong  to  the  same  antichain,  Ah  H  l(u)  =  0.  It  follows  that  Ah  H  J(ep(u))  =  0. 
Now  consider  the  graph  G  —  B\.  Ah  fl  J(ep(u))  =  0  still  holds  because  of  Property  1.  By 
the  construction  of  G,  J(np(uj)  =  J(ep(u))  U  {np(u)}  and  therefore  Mi  ft  J(np(u))  =  0.  Since 
np(u)  is  maximal  and  so  Ah  H  ^(np^))  =  0,  we  have  proved  that  every  vertex  np(u)  in  B2  is 
incomparable  with  every  vertex  in  M\.  Because  M\  is  an  antichain  by  itself  and  all  np(u)’s 
in  B2  are  maximal,  Mi  U  B2  G  M{G  —  B\). 

Next  we  prove  Alx  U  B2  =  AI“(G  —  B\)  by  contradiction.  Because  every  vertex  in  B2  is 
maximal  on  the  chain  it  belongs  to,  B2  C  AI“(G  —  B\)  by  Lemma  2.  Suppose  AIX  U  B2  ^ 
Al’(G  —  B\ ).  There  must  exist  AI[  =  AI*{G  -  Bx)\  B2  such  that  M[  C  V .  Ah  hi  Al[ 
and  Mi  -f-  A t[.  We  then  have  T(M[)  C  F{Ah)  by  Lemma  3.  Now  consider  the  graph  G' . 
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(a)  (b) 


Figure  6:  Transforming  the  potential  recovery  line  by  replacing  iV/2  with  i 32. 

Recall  M\  and  iV/2  form  an  antLnain  in  the  graph  G1,  which  implies  M2  D  ^(Mi)  =  0.  Thus 
M2  H  iF(M[)  =  0  and,  because  Rule  2  ^  grantees  M2  H  T{M[)  =  0,  M{ U M2  €  M.{G').  The 
fact  that  M,'  U  M2  is  a  greater  M-chain  than  M\  U  M2  in  G'  contradicts  M“(G')  =  M\  U  M2- 
Hence,  M\  U  B2  =  Mm(G  —  B\).  It  ’mmediately  follows  that  if  i>  6  M‘{G')  for  some 
G'  €  Os(G),  v  €  Mi  C  M*(G  -  Bi)  for  J5,  C  U.  □ 

The  proof  of  Lemma  5  shows  that  although  the  number  of  potential  supergraphs  of  a 
checkpoint  graph  G  is  infinite,  the  recovery  lines  of  these  potential  supergraphs  can  only 
intersect  G  in  a  finite  number  of  ways.  Furthermore,  each  of  the  possible  intersections  must 
be  part  of  the  recovery  line  for  one  of  the  2;V  graphs  G  —  W,  W  C  U . 

B.  Deleting  Vertices  During  Rollback  Recovery 

Existing  vertices  on  a  checkpoint  graph,  for  example  C P22  in  Fig.  4,  may  be  deleted  due 
to  rollback  recovery.  Let  Ge  denote  the  extended  C’PG  during  recovery.  G  =  (V.  E)  denote 
the  subgraph  of  Ge  without  the  virtual  checkpoints,  and  G~  =  (V'',^-)  denote  the  CPG 
immediately  after  recovery.  By  definition,  M“(Ge)  is  the  local  recovery  line.  Define  a  strict 
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filter  corresponding  to  an  antichain  A  in  a  poset  P  as 


F.{A)  =  F(A)\A. 

According  to  the  recovery  protocol,  we  have  G~  —  G  —  F3(Mm(Ge)).  Let  M’{Ge)  =  Mr U Mv 
where  Mr  =  M*(Ge)  fl  V  and  Mv  =  Mm(Ge)  \  A/r4,  and  define 

Ti  =  {np(u)  :  u  6  if,},  T2  =  {np(u)  :  u  £  Mv}. 

In  other  words,  T\  contains  vertex  n,  for  each  processor  p,  which  contributes  a  real  checkpoint 
to  the  local  recovery  line.  Parallel  to  the  definitions  of  e,,  nx,  U ,  G,  Tx  and  T2  for  G,  we 
define  e~,  n“,  U~,  G~ ,  Tf  and  T2_  for  G~ .  Clearly,  Tfi  =  T2.  The  following  lemma  gives 
several  properties  of  G  and  G~ . 

PROPERTY  2 

(a)  For  any  u  £  MT,  u  is  maximal  in  G~  ; 

(b)  M'{G  -  Tx)  =  Mr  U  T2; 

(c)  G  £  QS(G~). 

Proof,  (a)  For  any  u  £  MT,  u  is  on  the  local  recovery  line.  All  the  vertices  “greater 
than”,  or  “strictly  reachable  from”,  u  are  deleted  from  the  checkpoint  graph  by  the  rollback 
propagation  algorithm,  and  therefore  u  is  maximal  in  G~  after  recovery. 

(b)  Since  Ge  6  G,{G)  and  M*(Ge)  =  iV/r  U  Mv ,  we  can  show  A/r  U  T2  forms  the  MM-chain 
of  G  —  T\  by  following  the  proof  of  Lemma  5. 

(c)  In  order  to  prove  G  £  QS{G~),  we  have  to  show  that  all  the  vertices  in  J-s{M'{Ge)) 
and  their  associated  edges  can  be  added  back  to  G~  to  reconstruct  G  without  violating 
Rules  1  and  2.  Rule  1  is  obviously  satisfied.  By  always  adding  the  “smaller”  vertices  first, 
Rule  2  is  enforced  among  the  vertices  in  T,{M'{Ge))  during  the  process.  Suppose  Rule  2  is 
violated  when  v  £  J-3{Mm{Ge))  is  being  added,  i.e.,  there  exists  u  £  G~  such  that  an  edge 
is  drawn  from  v  to  u.  By  the  definition  of  Tfi A/*(Ge)),  v  <  u  and  v  £  lFs{Mm(Ge))  implies 

’The  subscript  “r”  stands  for  real  checkpoints  and  “v"  for  virtual  checkpoints. 
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u  €  Jrs(l\Im(Ge)),  contradicting  the  fact  that  G~  D  Fs(M*(Ge))  =  0-  Therefore,  Rule  2  is 
also  satisfied  and  G  €  (73(G').  □ 

We  now  prove  Lemma  6  which  shows  a  way  to  transform  a  recovery  line  of  G~  —  W~, 
W~  C  U~ ,  to  a  recovery  line  of  G  —  W,  W  C  U. 

LEMMA  6  For  checkpoint  graphs  G  =  (V,  E)  and  G~  —  ( V~,E~ )  as  defined  at  the  begin¬ 
ning  of  this  subsection  and  any  v  6  V~ ,  if  v  G  M’(G~  —  W~)  for  some  W~  C  U~ ,  then 
v  €  Mm(G  —  W)  for  some  W  C  U . 

Proof.  Partition  i X1*(G~  —  W~ )  =  Mi  U  A/2  U  M3  where  Mi  =  M“(G~  —  W~)  fl  V~, 
M2  =  M‘{G~  -  W~)\Mi\  77  and  M3  =  M‘(G~  -W~)\  M1  \  M2.  Clearly,  v  €  Mx.  Fig.  7 
illustrates  the  above  notation.  Also  define 


M4  _  {ep(tt)  :  np(u)  €  M3} 

B  =  {np(u)  :  u  €  Mi.} 

We  want  to  prove  that  Mx  U  M2  U  M4  =  M“(G  —  (Tx  U  B)). 

First  we  show  Mx  U  M2  U  M4  €  M(G  —  (Tx  U  B)).  By  the  definition  of  M4,  M4  <  M3  and 
thus  J(M4)  C  1(M3).  Since  Mx  U  M2  U  M3  forms  an  antichain,  (Mj  U  M2)  fl  T(.V/3)  =  0.  So 
( M 1 U  M2 )  DJ  ( M4 )  =  0.  Now  consider  the  graph  F{Mf)  =  0  because  vertices  in 

A/4  are  all  maximal  according  to  Property  2(a).  Therefore,  M1UM2UM1  6  M(G~  —  (TfUB)). 
It  is  not  hard  to  see  that  G  €  Q${G~ )  (Property  2(c))  implies  G-(TiUB)  €  QS{G~  —  (Tf  UJ3)). 
By  Lemma  4(c),  Mx  U  M2  U  M4  6  M ( G  —  (Tx  U  B)). 

Now  suppose  M\  U  M2  U  M4  ^  M*(G  —  (Tj  U  5)).  Since  G  —  Ti  €  £,((?  —  (Tt  U  B )), 
M*(G  —  (Ti  U  5))  X  M*(G  —  Xi)  by  Lemma  4(e).  The  fact  that  M2  U  M4  C  T2  U  Mr  = 
M‘{G  —  Ti)  (Property  2(b))  and  M\  U  M2  U  M4  S  M{G  —  {Tx  U  B ))  implies  that  M2  U  M4  C 
M‘(G  —  (TiU  B)).  Therefore,  there  must  exist  M{  =  Mm(G  —  (Tx  U  B))  \  (M2  U  A/4 )  such  that 
Mx  <  M[  and  Mx  ±  M[.  In  fact,  M[  C  V~  because  M[  ■<  A/*(G-  (T,  UB))  ■<  M‘{G-TX). 
Now  consider  the  graph  G~  —  W~ .  T ( M[ )  C  iF(Mx)  and  (M2  U  A/3)  fl  T(MX)  =  0  lead 


15 


Po 


Pi 

P2 

P3 


P4 


(a) 


Figure  7:  Transforming  the  potential  recovery  line  by  replacing  M3  with  A/4. 

to  (A/2  U  Ms)  n  T{M[)  =  0.  Since  M2  U  A/3  C  U~  and  M[  C  V~ ,  (M2  U  M3)  n  I(.V/{)  = 
0.  Therefore,  A/{  U  M2  U  Ms  €  M(G~  —  IF-),  Mi  U  M2  U  M3  X  U  iV/2  U  A/3  and 
iV/j  U  M2  U  M3  7^  U  A/2  U  A/3,  contradicting  the  fact  that  M’{G~  —  )  =  Mi  U  A/2  U  .V/3. 

Therefore,  Mi  U  M2  U  M4  =  M“(G  —  ( 7\  U  /?)).  Finally,  we  have  v  6  Mi  C  M'(G  —  VF) 
where  VF  =  T;  U  B  C  U.  □ 


C.  Potential  Recovery  Lines 

We  now  apply  Lemmas  5  and  6  to  predicting  the  possible  intersections  of  a  given  checkpoint 
graph  G  and  all  its  potential  recovery  lines.  An  operational  session  [9]  is  the  inteival  between 
the  start  of  normal  execution  and  the  instance  of  error  recovery.  The  entire  program  execu¬ 
tion  can  be  viewed  as  consisting  of  several  operational  sessions  ordered  by  session  numbers. 
By  repeatedly  applying  Lemma  5  within  the  same  operational  session  and  Lemma  6  across 
consecutive  sessions,  every  potential  recovery  line  of  G  can  be  transformed  to  the  recovery 
line  of  one  of  the  graphs  G  -  W,  W  C  U . 
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THEOREM  1  Given  a  checkpoint  graph  G  =  [V,  E)  and  v  G  V 


v  €  M’(G')  for  some  G‘  =  (V",  E')  G  Gj{G) 


if  and  only  if 

v  6  M’{G  —  W)  for  some  W  C  U. 

Proof.  The  if  part  is  trivial  because  G  —  W  G  Gf{G).  We  now  prove  the  only  if  part. 
If  v  G  M’(G')  for  some  G1  G  G/{G),  without  loss  of  generality,  we  may  assume  G  is  in  the 
k th  session  and  G1  belongs  to  the  / th  session  where  /  >  k.  Let  G,  denote  the  checkpoint 
graph  at  the  end  of  the  ith  session  and  G~  denote  the  checkpoint  graph  as  the  zth  session 
starts,  i.e.,  immediately  after  the  recovery  or  at  the  beginning  of  program  execution.  Clearly, 
Gi  G  Gs(G~).  Also  let  U,  and  U~  denote  the  set  of  n,’s  for  Gi  and  G~ ,  respectively.  Now 
consider  the  graphs  Gf  and  Gk,  and  a  vertex  u  which  exists  during  the  evolution  from  Gk  to 
Gj .  If  u  G  M‘(Gf  —  W[~)  for  some  W,~  C  Uj~,  u  G  A/‘(G/_i  —  Wt~x)  for  some  Wt C  £/*_ 1 
by  Lemma  6.  Since  G;_i  —  Wi_x  G  Gs(Gi-i)  C  G„{G] L,),  u  G  M"(Gf_x  —  Wf_x)  for  some 
C  Uf_x  by  Lemma  5.  By  induction,  we  have  u  G  Mm(Gk  —  Wk)  for  some  Wk  C  Uk. 
Since  G '  G  G3{Gf),  v  G  M‘(Gf  —  Wf )  for  some  Wf  C  Uf  by  Lemma  5.  By  the  above 
induction  result,  v  G  Mm(Gk  —  W*)  for  some  Wk  C  Uk-  Finally  by  Lemma  5  again,  we  have 
v  G  Mm(G  —  W)  for  some  W  C  U  because  Gk  —  Wk  €  G,{Gk)  Q  G,{G).  □ 

Theorem  1  shows  that  if  any  checkpoint  on  a  given  checkpoint  graph  G  may  be  useful  for 
future  recovery,  it  must  belong  to  the  recovery  line  of  one  of  the  2/V  “immediate  supergraphs’’ 
of  G.  Therefore,  if  we  apply  the  rollback  propagation  algorithm  to  each  of  the  2jV  graphs 
G  —  W,  W  C  (/,  and  take  the  union  of  all  the  resulting  recovery  lines,  we  can  obtain  the 
set  of  non-discardable  checkpoints.  However,  this  is  an  exponential  algorithm  and  may  be 
unacceptable  for  applications  with  a  large  number  of  processors.  The  next  section  describes 
how  this  complexity  can  further  be  reduced. 
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D.  Deleting  Vertices  for  Discarded  Checkpoints 

There  is  in  fact  another  situation  where  existing  vertices  may  be  deleted.  Once  a  checkpoint 
is  determined  to  be  discardable  and  its  space  is  reclaimed,  the  corresponding  vertex  on  the 
checkpoint  graph  can  be  deleted.  However,  since  the  deletion  is  not  part  of  the  program 
execution  as  in  the  recovery  process,  we  can  not  simply  remove  all  the  edges  connected  to 
the  deleted  vertex.  The  deletion  of  a  discarded  checkpoint  v  must  follow  the  procedures 
described  below  in  order  to  preserve  the  relations  implied  through  v  among  the  remaining 
vertices. 

1.  A  new  edge  is  generated  for  each  pair  of  incoming  and  outgoing  edges  of  v. 

2.  The  source  vertices  of  all  the  incoming  edges  of  v  have  to  be  remembered.  When  an 
outgoing  edge  of  v  is  added  in  the  future,  it  is  replaced  by  outgoing  edges  from  these 
source  vertices. 

Since  none  of  the  potential  recovery  lines  can  contain  v  and  the  relations  among  all  the 
remaining  vertices  and  the  new  vertices  remain  unchanged,  the  deletion  of  v  will  not  affect 
any  potential  recovery  line. 


V  PREDICTIVE  CHECKPOINT  SPACE 

RECLAMATION 


By  applying  Lemma  1,  we  will  show  that  each  of  the  2'v  MM-chains  in  Theorem  1  can  be 
“synthesized”  from  the  same  set  of  N  MM-chains.  An  efficient  algorithm  is  then  developed 
for  finding  the  set  of  non-discardable  checkpoints. 

LEMMA  7  Given  a  poset  P  =  (5,  <)  and  A,  B  C  5, 

min(A  U  B )  =  niru(min( /l )  U  B). 


IS 


Proof.  Let  min'(A)  =  A  \  min(A).  For  each  a'  £  min'(A),  there  exists  a  £  min(A)  such 
that  a  <  a'.  Since  a,  a'  £  ,4  U  B ,  a1  £  min(A  U  B).  Also,  for  each  non-minimal  r'  £  A  U  B. 
there  exists  c  £  min(  A  U  B),  c  £  min'(A)  such  that  c  <  c'.  Therefore, 

min(A  U  B)  =  min(min(A)  U  min' (A)  U  B)  =  min(min(A)  U  B). 

□ 


LEMMA  8  Given  a  poset  P,  M  £  M.(P)  and  M  ■<  Mi  £  .Vf(P)  for  i  £  [0,  k  —  1]  for  any 
finite  k.  Define 

A  Mi  =  (...((Mo  A  Mi)  A  A/2)  ...  A  A/,_J, 


(a) 


M  x  A  €  M(P); 

i€[0.Jt-l] 

(b) 


A  M,  =  mini  U  Mi). 

•e[o.fc-i]  i€(o,fc-i] 

Proof.  Both  parts  will  be  proved  by  induction  on  k. 

(a)  By  Lemma  l,  M(P)  is  a  lattice  and  so  Mo  A  Mi  €  M (P).  Also.  M  -<  Mo  A  M\  because 

M  -<  Mo ,  M  <  M{  and  A/0  A  M\  is  the  greatest  lower  bound  of  Mo  and  M\.  We  have  shown 

the  case  k  =  2  is  true.  Assume  it  is  true  for  k  =  n  —  1.  i.e.. 


Again,  by  Lemma  1, 


M  ■<  A  -v/-  €M(P). 

•  e(0,n— 2] 


A  A/,  =  (  A  Mi)  A  A/n-j  £  M(P). 

■  e[0,n-l]  i€(0,n-2] 


F,q.  (1)  and  M  <  Mn _  implies 


M  <  A  A*i- 

,€[0.71-1] 


(1) 
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Therefore,  it  is  also  true  for  k  ~  n  and  so  we  have  (a). 

(b)  The  case  k  =  2  is  true  by  Lemma  1.  Assume  it  is  true  for  k  —  n  —  1,  i.e., 

A  M,  =  min{  |J  Mi).  (2) 

ie[C,n-2]  i£[0.ri-2] 

Applying  part  (a),  Eq.  (2)  and  Lemma  1,  we  have 

A  M  =  (  A  A/,)  A  Mn_!  =  min(min(  (J  A/,)UA/n-i). 

>6[0,n-l]  <£[0,n-2]  t6[0,n-2] 

Lemma  7  further  gives  that 

min(min(  (J  A/, )  U  A/„_i )  =  m2n(  (J  A/,-  U  Mn-i)  =  min{  [J  A/,). 

i£[0,n-2]  >e[0,n— 2]  i6[0,n-lj 

Therefore,  by  induction,  part  (b)  is  true.  □ 

LEMMA  9  For  every  W  C  U , 

A/*(G  -  IF)  =  mm(  (J  A/*(G  -  n,)). 
n.eH' 

Proof.  If  there  are  k  vertices  in  the  set  IF ,  without  loss  of  generality,  we  may  assume 

they  are  z0.  z\ . Zk~\,  i.e.,  {n,  :  n,  G  IF}  =  {zj  :  j  G  [0,  k  —  l]}.  Since  G  —  z3  G  QS[G  -  IF). 

M‘(G  —  IF)  ■<  \[m(G  —  z:)  for  all  j  G  [0,  k  —  1]  by  Lemma  4(e). 

Now  consider  the  graph  G.  G  G  Q»(G  —  r_,)  implies  that  A/*(G  —  z3)  G  M. ( G )  for 
j  G  [0,  k  —  1].  Similarly,  A/*(G  —  IF  )  G  M{G).  By  Lemma  8(a)  and  (b), 

M'(G-W)<  A  Mm(G-Zj)  =mm(  [J  AT(G  -  c,))  G  M(G).  (3) 

j€(0,fc-l]  j€{0,fc— 1] 

Moreover,  for  every  j  G  [0,  A:  —  1],  there  exists  u  G  A/*(G  —  Zj)  with  p(u)  =  p(sj)  such  that 
u  <  Zj.  Since  u  G  Uje[o,fc-i]  A/*(G  -  c,),  z}  £  mi»i(Uj€[0 A/*(G  -  zj)).  We  have,  by 
Lemma  4(d),  min((J;e[o.fc-i]  -IF(G  —  z3))  G  M(G  —  IF)  and  therefore 

mm(  [J  A/*(G  —  ))  X  A/*(G  —  IF). 

■>€[0.fc-l) 
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Together  with  Eq.  (3),  we  have  proved 


M’(G  —  IT)  =  min(  [J  XP 

j€[CU— 1] 


(G  —  Zj))  =  min(  [J  Mm(G  —  rc,)). 

n.eiv 


□ 


In  particular,  the  global  recovery  line,  Mm(G),  can  be  obtained  by  letting  W  =  U, 

M"(G)  =  min(  [j  M’{G-m)). 

«€[0,N-1] 

Let  Np(G)  denote  the  set  of  non-discardable  checkpoints  of  G.  Theorem  2  states  a  major 
result  of  this  paper  in  that  it  provides  the  basis  for  an  efficient  algorithm  to  find  all  non- 
discardable  checkpoints. 

THEOREM  2  Given  a  checkpoint  graph  G  =  {V,E), 

Nd(G)=  (J  Mm{G  —  rii)  D  V 

i€[0,JV  — 1] 

Proof.  For  any  v  €  Ui6[o,/v-i]  Mm[G  —  n,)  fl  V ,  v  £  M’(G  —  n,)  for  some  i  £  [0,  N  —  1]. 
Since  G  —  n,  £  Q/(G),  v  £  Nd(G)  by  definition.  Thus  U.e[o,yv— i]  Mm{G  —  n<)  fl  V  C  Nd(G). 

Conversely,  for  any  v  £  Np(G ),  v  £  V  and  v  £  M'(G  —  W)  for  some  W  C  V  by 
Theorem  1.  Lemma  9  further  gives  that 

v  £  min{  1J  M’{G  -  n,))  C  [j  M‘(G  -  n.)  C  jj  M’{G  -  nt). 

n.ew  n.eiK  ie[o,N-i) 

Therefore,  Np(G)  C  U.e[o,jV-i]  Mm{G  —  n,)  fl  V  and  so  we  have 

ND(G)=  y  M’(G  -  Hi)  n  v. 

<e[o,jV-i] 

□ 

Based  on  Theorem  2  we  now  present  the  Predictive  Checkpoint  Space  Reclamation  (PCSR) 
algorithm  in  Fig.  8.  The  algorithm  is  of  complexity  O ( iV | £7 1 ) ,  where  l^l  is  the  total  number 
of  edges  in  the  checkpoint  graph. 


/*  N  is  the  number  of  processors  */ 

/*  G  and  n,  are  as  defined  in  Fig.  5  */ 

for  each  i  €  [0,  N  —  1]  { 

apply  the  rollback  propagation  algorithm  to  the  checkpoint  graph  G  —  n ,  to 
find  the  recovery  line; 

the  checkpoints  in  the  intersection  of  G  and  the  recovery  line  are  included  in 
the  set  Nd(G ); 

} 

all  the  checkpoints  not  in  Nd{G)  can  be  reclaimed. 


Figure  8:  The  Predictive  Checkpoint  Space  Reclamation  Algorithm. 

An  example  illustrating  the  execution  of  the  PCSR  algorithm  on  the  checkpoint  graph  in 
Fig.  2(b)  is  shown  in  Fig.  9.  The  traditional  checkpoint  space  reclamation  algorithm  can  only 
reclaim  the  first  checkpoint  of  each  processor.  The  PCSR  algorithm,  however,  determines 
that  all  the  checkpoints  marked  “X”  are  discardable. 


VI  THE  MAXIMUM  NUMBER  OF 
NON-DISCARD  ABLE  CHECKPOINTS 


Traditionally,  only  obsolete  checkpoints  can  be  discarded.  Since  it  is  possible  for  the 
domino  effect  to  persist  during  program  execution,  a  common  perception  is  that  a  large 
number  of  non-obsolete  checkpoints  may  have  to  be  kept  and  the  space  overhead  may  con¬ 
stantly  grow  as  a  program  proceeds.  In  a  sense,  this  is  a  more  serious  disadvantage  than 
slower  recovery  due  to  the  domino  effect  because  it  results  in  unpredictable  space  overhead 
during  normal  execution.  Theorem  2  not  only  identifies  the  minimum  set  of  non-discardable 
checkpoints  but  also  places  an  upper  bound  i\'2  on  the  number  of  non-discardable  check¬ 
points  for  a  general  checkpoint  graph  because  each  Mm(G  —  i  €  [0.  V  —  1],  consists  of  N 
checkpoints.  A  tighter  upper  bound  obviously  exists  based  on  the  following  observation: 
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Figure  9:  Execution  of  the  PCSR  algorithm  (a)  G  —  n0  (b)  G  —  nx  (c)  G  —  n2  (d)  G  —  n3. 
(Shaded  checkpoints  belong  to  the  recovery  lines  and  checkpoints  marked  ‘‘X”  are  discard¬ 
able.) 

1.  M‘(G  -  rii)  may  contain  vertices  from  the  set  U,  but  we  are  only  concerned  about 
vertices  in  G ; 

2.  M'{G  —  n,  )’s  may  not  be  mutually  disjoint; 

3.  if  the  last  vertex  e,  on  chain  C,  is  maximal,  M'(G  -  nt)  will  contribute  only  a  single 
vertex  to  the  set  Nd{G),  i.e.,  e,-  itself. 


The  following  property  addresses  the  implicit  relations  among  M*(G  —  n,)’s. 

PROPERTY  3  Let  mXJ  denote  the  vertex  in  M’(G  -  n.)  from  processor  pr  For  i,j  £ 
[0. N  -  1]  and  i  ^  j,  if  mtJ  and  m;i  ^  n,.  then  MM{G  -  n,)  =  M*(G  -  n;). 
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Proof.  m,j  /  n j  implies  Mm(G  —  n,)  C  G  —  rn  —  tij.  By  Lemma4(d)  and  (c),  M‘(G  —  nt)  € 
M{G  —  n i  —  rij )  C  M{G  —  rij)  and  so 

M*(G  -  rii)  <  M*(G  —  rij). 

Similarly,  ^  leads  to 

M’(G  —  n})  <  i\r(G-m). 

Since  (A4(G  —  n,  —  n;),  X)  forms  a  poset  (Lemma  1(b)),  we  have 

M'(G  —  n,)  =  A/^G'-nj). 

□ 

It  should  be  noted  that  the  PCSR  algorithm  can  be  further  improved  by  applying  Prop¬ 
erty  3.  Inside  the  loop  in  Fig.  8,  suppose  we  have  found  the  recovery  line  M"(G  —  n,).  Define 
the  index  set  T  as 

T  =  {j  :  mtj  ±  rij ,  j  £  [0,  N  —  1]  and  j  >  i}. 

Then  for  each  later  loop  index  j  £  T,  the  rollback  propagation  algorithm  can  be  aborted 
when  any  checkpoint  from  processor  p,  is  marked.  Because  this  would  mean  mJt  /  n,  and 
M"{G  —  n})  is  exactly  the  same  as  A/*(G  —  n,). 


THEOREM  3  For  any  checkpoint  graph  G  =  (V,  E)  in  a  system  with  N  processors. 


\Nd(G)\  < 


N(N  +  1) 

9 


Proof.  By  Theorem  2,  we  only  have  to  consider  the  N2  vertices  m,j,  i,j  £  [0,  N  —  1].  For 
each  i  £  [0,  N  —  1],  mu  £  V  and  contributes  one  vertex  to  Nd(G).  Since  all  the  m„’s  come 
from  different  processors,  Nq{G )  now  consists  of  N  vertices.  For  the  remaining  N2  —  N 
vertices  with  i  ^  _/,  we  consider  each  pair  mt]  and  m:t  at  a  time  and  there  are  (.V2  —  Ar)/2 
such  pairs.  We  distinguish  three  cases: 


Case  1:  mi;  =  n}  and  m]t  =  n,.  Both  mtJ  and  nij,  do  not  belong  to  Nq{G). 
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Case  2:  m,_,  =  n:  and  m y  ^  rii,  or  mtJ  ^  n3  and  mJt  =  n,-.  This  pair  will  possibly  add  one 

new  vertex  to  Nd{G). 

Case  3:  m,j  ^  n3  and  rriji  ±  n,-.  It  follows  that  M’(G  —  rii)  =  M'(G  —  nj)  by  Property  3, 
and  so  m,j  =  my,  and  rriji  =  ma ■  Since  rrijj  and  m,-,-  are  already  in  Nd(G),  this  case 
does  not  increase  the  size  of  Nd{G). 


Therefore,  each  of  the  (N2  —  N)/2  pairs  can  contribute  at  most  one  new  vertex  to  Nd(G). 
We  then  have 

|WD(G)|<,v  +  ^  =  ^±il. 


One  may  argue  that  the  upper  bound  derived  in  Theorem  3  is  still  of  order  N2.  We  will 
next  show  that  N(N  +  l)/2  is  in  fact  the  lowest  upper  bound,  i.e.,  the  maximum,  because 
for  any  N  we  can  construct  a  checkpoint  graph,  6'^,  as  shown  in  Fig.  10  to  achieve  this 
upper  bound.  By  applying  the  PCSR  algorithm  shown  in  Fig.  8,  it  is  not  hard  to  see  that 
all  the  N(N  +  l)/2  vertices  in  Fig.  10  are  non-discardable. 


Figure  10:  G*v:  The  checkpoint  graph  with  ;V(/V  +  l)/2  non-discardable  checkpoints. 
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When  a  checkpoint  graph  is  given,  we  can  further  reduce  the  maximum  by  counting  the 
number,  L ,  of  maximal  vertices  in  the  set  of  e;’s  (Fig.  5).  Recall  that  if  e,  is  maximal. 
rn.it  —  £i  and  mtJ  =  n}  for  j  ^  i.  Therefore,  in  the  discussion  for  each  pair  of  and  mJt  in 
the  proof  of  Theorem  3,  the  case  when  both  e,  and  ej  are  maximal  corresponds  to  Case  1. 
The  maximum  then  becomes 

|AWO)|<ff+(£)-g). 

In  particular  when  L  =  N,  |Arc(G)|  =  N,  which  corresponds  to  the  case  of  coordinated 
checkpointing. 


VII  EXPERIMENTAL  RESULTS 


Four  parallel  programs  are  used  to  illustrate  the  checkpoint  space  reclamation  capabilities 
and  benefits  of  the  PCSR  algorithm.  Two  are  CAD  programs  written  for  the  Intel  iPSC/2 
hypercube:  a  Cell  placement  program  and  a  Channel  router  program.  The  other  two  are 
the  Knight  tour  program  and  the  N-queen  program  written  in  the  Chare  Kernel  language 
which  has  been  developed  as  a  medium-grain,  message-driven  and  machine-independent 
parallel  language  [24].  We  use  the  version  of  Chare  Kernel  on  the  Encore  Multimax  510 
multiprocessor.  The  periodic  checkpointing  routine  is  implemented  as  the  interrupt  service 
routine  for  UNIX  alarm(T)  system  call,  where  T  is  the  checkpoint  interval  in  seconds.  Each 
processor  sets  the  alarm  at  the  very  beginning  of  the  program  and  the  checkpointing  routine 
independently.  A  concurrent  checkpointing  algorithm  as  described  by  Li  et.  al  [25]  is  assumed 
so  that  the  program  thread  is  interrupted  for  a  small,  fixed  amount  of  time  (0.1  seconds) 
for  taking  each  checkpoint,  after  which  the  checkpointing  thread  executes  concurrently  with 
the  program  thread  to  finish  the  checkpointing.  Communication  traces  are  collected  by 
intercepting  the  “send”  and  “receive”  system  calls.  Communication  trace-driven  simulation 
is  then  performed  to  obtain  the  results. 
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The  number  of  processors  used  and  the  total  execution  time  for  each  program  are  listed  in 
Table  1.  The  checkpoint  interval  for  each  program  is  arbitrarily  chosen  to  be  approximately 
one  tenth  of  the  execution  time. 


Table  1:  Execution  and  checkpoint  parameters  of  the  programs. 


Benchmark 

programs 

Cell 

placement 

Channel 

router 

Knight 

tour 

N-queen 

Number  of 
processors 

8 

8 

6 

6 

Machine 

Intel  iPSC/2 
hypercube 

Intel  iPSC/2 
hypercube 

Encore 

Multimax 

Encore 

Multimax 

Execution 
time  (sec) 

322.7 

442.0 

273.2 

1625.1 

Checkpoint 
interval  (sec) 

35 

40 

30 

150 

Figs.  11-14  compares  our  PCSR  algorithm  with  the  traditional  checkpoint  space  reclama¬ 
tion  algorithm  for  typical  executions  of  the  four  programs.  Since  obsolete  checkpoints  must 
be  discardable,  the  curves  for  non-discardable  checkpoints  are  always  below  the  curves  for 
non-obsolete  checkpoints.  Note  that  the  curves  do  not  show  the  actual  number  of  checkpoints 
that  would  be  kept  on  stable  storage  during  program  execution  because  the  checkpoint  space 
reclamation  algorithm  would  not  be  continuously  active  throughout  the  program  execution. 
Instead,  it  shows  the  number  of  checkpoints  which  have  to  be  kept  if  the  algorithm  is  invoked 
after  a  certain  number  of  checkpoints  have  been  taken.  The  domino  effect  is  illustrated  by 
the  linear  increase  in  the  number  of  non-obsolete  checkpoints  as  the  total  number  of  check¬ 
points  increases.  These  figures  show  that  the  PCSR  algorithm  is  particularly  effective  when 
the  domino  effect  persists.  The  largest  difference  between  the  number  of  non-obsolete  check¬ 
points  and  the  number  of  non-discardable  checkpoints  for  each  figure  is:  39  versus  7  for  Cell 
placement,  40  versus  12  for  Channel  router.  24  versus  10  for  Knight  tour  and  41  versus  5  for 
N-  queen. 
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Figure  11:  Non-obsolete  checkpoints  and  non-discardable  checkpoints  for  the  Cell  placement 
program. 

VIII  CONCLUSIONS 


The  problem  of  finding  recovery  lines  for  independent  checkpointing  schemes  was  for¬ 
mulated  as  determining  the  maximal  maximum-sized  antichains  of  partially  ordered  sets. 
We  presented  a  method  for  predicting  the  possibility  of  any  checkpoint  becoming  a  member 
of  future  recovery  line,  and  showed  that  sometimes  checkpoints  will  never  be  needed  for 
recovery  so  their  space  can  be  reclaimed.  Based  on  the  algorithm  for  finding  the  recovery 
lines,  a  new  checkpoint  space  reclamation  algorithm,  with  complexity  linear  in  the  number 
of  processors  N  and  linear  in  the  number  of  edges  in  the  checkpoint  graph,  was  developed 
for  determining  the  set  of  non-discardable  checkpoints.  The  maximum.  N(N  +  l)/2,  on  the 
number  of  non-discardable  checkpoints  for  an  arbitrary  checkpoint  graph  was  also  derived 
to  show  that  the  space  overhead  for  maintaining  multiple  checkpoints  is  bounded  even  when 
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Figure  12:  Non-obsolete  checkpoints  and  non-discardable  checkpoints  for  the  Channel  router 
program. 

the  domino  effect  persists  during  program  execution.  Communication  trace-driven  simula¬ 
tion  for  four  parallel  programs  illustrated  that  the  algorithm  can  be  effective  in  significantly 
reducing  the  number  of  retained  checkpoints. 
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